Zurich by the Numbers - Predictive Insights into Tourism Dynamic
Authors
Affiliation
Name I, First Name I
University of Lausanne
Name II, First Name II
Published
May 1, 2024
Abstract
The following Forecasting project focuses on applying forecasting techniques to predict tourism trends in Zurich. This analysis aims to harness the power of historical data combined with forecasting algorithms to provide actionable insights into future tourism patterns. We engage in comprehensive data preparation, explore various predictive models, and conduct a detailed evaluation of their forecasting accuracy. The project encapsulates the challenge of turning complex data into understandable and strategic information, crucial for effective decision-making in Zurich’s tourism sector.
1 DATA
1.1 Cleaning
1.1.1 Tourism Data - All
Click to show code
# Load the data in folder data named Dataset_tourism.xlsx)tourism_data <- readxl::read_xlsx(here("data/Dataset_tourism.xlsx"))#removing value 'Herkunftsland - Total' in column 'Herkunftsland' as it is just the totaltourism_data <- tourism_data %>%filter(Herkunftsland !="Herkunftsland - Total")#print unique values in month columnunique(tourism_data$Monat)#> [1] "Januar" "Februar" "März" "April" "Mai" #> [6] "Juni" "Juli" "August" "September" "Oktober" #> [11] "November" "Dezember"# change ' [1] "Januar" "Februar" "März" "April" "Mai" "Juni" "Juli" "August" "September" "Oktober" "November" "Dezember" into english month'tourism_data$Monat <- tourism_data$Monat %>%recode_factor("Januar"="January","Februar"="February","März"="March","April"="April","Mai"="May","Juni"="June","Juli"="July","August"="August","September"="September","Oktober"="October","November"="November","Dezember"="December")#add date type column for plotting purposestourism_data <- tourism_data %>%mutate(Date =dmy(paste("01", Monat, Jahr)))#check for NANsum(is.na(tourism_data))#> [1] 51395#analyse the NAN values, where are they(tourism_data %>%filter(is.na(value)))#> # A tibble: 51,395 x 6#> Herkunftsland Kanton Monat Jahr value Date #> <chr> <chr> <fct> <chr> <dbl> <date> #> 1 Malta Schwe~ Janu~ 2005 NA 2005-01-01#> 2 Zypern Schwe~ Janu~ 2005 NA 2005-01-01#> 3 Mexiko Schwe~ Janu~ 2005 NA 2005-01-01#> 4 Übriges Zentralamerika, Karib~ Schwe~ Janu~ 2005 NA 2005-01-01#> 5 Bahrain Schwe~ Janu~ 2005 NA 2005-01-01#> 6 Katar Schwe~ Janu~ 2005 NA 2005-01-01#> 7 Kuwait Schwe~ Janu~ 2005 NA 2005-01-01#> 8 Australien Schwe~ Janu~ 2005 NA 2005-01-01#> 9 Neuseeland, Ozeanien Schwe~ Janu~ 2005 NA 2005-01-01#> 10 Oman Schwe~ Janu~ 2005 NA 2005-01-01#> # i 51,385 more rows#show data using reactable only showing the first 100 rowsreactable::reactable(head(tourism_data, 1000))
1.1.2 Tourism Data - Zurich
Click to show code
#filter column 'Kanton' for Zurichtourism_data_zurich <- tourism_data %>%filter(Kanton =="Zürich")#check for NANsum(is.na(tourism_data_zurich))#> [1] 1869#analyse the NAN values, where are theytourism_data_zurich %>%filter(is.na(value))#> # A tibble: 1,869 x 6#> Herkunftsland Kanton Monat Jahr value Date #> <chr> <chr> <fct> <chr> <dbl> <date> #> 1 Malta Zürich Janu~ 2005 NA 2005-01-01#> 2 Zypern Zürich Janu~ 2005 NA 2005-01-01#> 3 Mexiko Zürich Janu~ 2005 NA 2005-01-01#> 4 Übriges Zentralamerika, Karib~ Zürich Janu~ 2005 NA 2005-01-01#> 5 Bahrain Zürich Janu~ 2005 NA 2005-01-01#> 6 Katar Zürich Janu~ 2005 NA 2005-01-01#> 7 Kuwait Zürich Janu~ 2005 NA 2005-01-01#> 8 Australien Zürich Janu~ 2005 NA 2005-01-01#> 9 Neuseeland, Ozeanien Zürich Janu~ 2005 NA 2005-01-01#> 10 Oman Zürich Janu~ 2005 NA 2005-01-01#> # i 1,859 more rows#show the data in a table using reactablereactable::reactable(head(tourism_data_zurich, 1000))
1.1.3 Tourism Data - Zurich and Philipines
Click to show code
tourism_data_zurich_philippines <- tourism_data_zurich %>%filter(Herkunftsland =="Philippinen")#show table using reactablereactable::reactable(tourism_data_zurich_philippines)
1.1.4 Deal with NAN
We have none in the data filtered with zurich and philippines, but if we would have we would :
1.1.4.1 Impute missing values ARIMA
If the missing values are random or if excluding them would result in a loss of valuable information, we might consider imputing them. One common approach is to use statistical models like ARIMA to interpolate missing values based on the patterns observed in the available data.
Click to show code
# #Creating a tsibble with missing values# data <- tourism_data_zurich_philippines %>%# as_tsibble(key = c(Kanton, Herkunftsland, Monat, Jahr)) %>%# select(Date, value) %>%# fill_gaps()# # # Fit an ARIMA model to data with missing values# model_fit <- data %>%# model(ARIMA(value))# # # Interpolate missing values using the fitted ARIMA model# filled_data <- model_fit %>%# interpolate(data)# # # Print the data with filled in missing values# print(filled_data)
2 EDA - Zurich
2.1 Zurich and All visiting countries
Click to show code
# Preparing the data#removing value 'Schweiz' in column 'Herkunftsland' as it is just the whole of Switzerlandtourism_data_zurich <- tourism_data_zurich %>%filter(Herkunftsland !="Schweiz")data <- tourism_data_zurich %>%filter(!is.na(value)) %>%# Removing rows with NA values in the 'value' columnmutate(Monat =month(Date, label =TRUE, abbr =TRUE), # Extract month from DateJahr =year(Date)) %>%# Extract year from Dategroup_by(Herkunftsland, Date) %>%# Group by country and datesummarise(Trips =sum(value), .groups ='drop') # Summing up trips for each country per datep <-ggplot(data, aes(x = Date, y = Trips, group = Herkunftsland,color = Herkunftsland =="Philippinen",text =paste("Country:", Herkunftsland, "<br>Trips:", Trips))) +# Added text for tooltipgeom_line(show.legend =FALSE) +scale_color_manual(values =c("TRUE"="red", "FALSE"="grey")) +labs(title ="Number of Trips from Each Country to Zurich",x ="Date",y ="Number of Trips") +theme_minimal()# Convert to an interactive plotly objectinteractive_plot <-ggplotly(p, tooltip ="text")# Adjust plotly settings interactive_plot <- interactive_plot %>%layout(margin =list(l =60, r =60, b =60, t =80), # Adjust marginslegend =list(orientation ="h", x =0, xanchor ="left", y =-0.2)) # Adjust legend position# Display the interactive plotinteractive_plot
2.2 Zurich and Philipines Visitors
Click to show code
# use tourism_data_zurich_philippines data to plot the values in y axis and Date in x axisp <-ggplot(tourism_data_zurich_philippines, aes(x = Date, y = value)) +geom_line() +labs(title ="Number of Trips from Philipines to Zurich",x ="Date",y ="Number of Trips") +theme_minimal()p
2.2.1 Pattern
2.2.1.1 Decompose
Click to show code
# Convert data to a time series objecttourism_ts <- tourism_data_zurich_philippines %>%arrange(Date) %>%# Ensure data is complete and monthlycomplete(Date =seq.Date(min(Date), max(Date), by ="month")) %>%replace_na(list(value =0)) %>%# Replace NA values if there are any# Create a time series objectwith(ts(value, frequency =12, start =decimal_date(min(Date))))# Decompose the time seriesdecomposed <-decompose(tourism_ts)# Plot the decomposed componentsplot(decomposed)